The purpose of this notebook is to prodvide more examples of estimating Normal Error Component Mixed Logit Models. The types of models to be estimated in this notebook include models that use mixing distributions to account for heteroskedasticity and/or non-zero correlations between the error terms of one's various utility equations.
As in some of the other example notebooks, the dataset being used is the "Travel Mode Choice" dataset from Greene and Hensher. It is described on the statsmodels website, and their description is reproduced below in full. The basic specification to be estimated will come from the example given in Econometric Analysis by William H. Greene (2011), section 23.11.7.
The data, collected as part of a 1987 intercity mode choice study, are a sub-sample of 210 non-business trips between Sydney, Canberra and Melbourne in which the traveler chooses a mode from four alternatives (plane, car, bus and train). The sample, 840 observations, is choice based with over-sampling of the less popular modes (plane, train and bus) and under-sampling of the more popular mode, car. The level of service data was derived from highway and transport networks in Sydney, Melbourne, non-metropolitan N.S.W. and Victoria, including the Australian Capital Territory. Number of observations: 840 Observations On 4 Modes for 210 Individuals. Number of variables: 8 Variable name definitions:: individual = 1 to 210 mode = 1 - air 2 - train 3 - bus 4 - car choice = 0 - no 1 - yes ttme = terminal waiting time for plane, train and bus (minutes); 0 for car. invc = in vehicle cost for all stages (dollars). invt = travel time (in-vehicle time) for all stages (minutes). gc = generalized cost measure:invc+(invt*value of travel time savings) (dollars). hinc = household income ($1000s). psize = traveling group size in mode chosen (number). Source Greene, W.H. and D. Hensher (1997) Multinomial logit and discrete choice models in Greene, W. H. (1997) LIMDEP version 7.0 user’s manual revised, Plainview, New York econometric software, Inc. Download from on-line complements to Greene, W.H. (2011) Econometric Analysis, Prentice Hall, 7th Edition (data table F18-2) http://people.stern.nyu.edu/wgreene/Text/Edition7/TableF18-2.csv
In [1]:
# Note the ordered dictionaries are used to store our
# model specifications
from collections import OrderedDict
# Pandas is used for data input/output
import pandas as pd
# Numpy is used for vectorized math operations
import numpy as np
# Statsmodels is used to retrieve the dataset
import statsmodels.datasets
# Pylogit is used to estimate the model
import pylogit as pl
In [2]:
# Access the dataset
mode_data = statsmodels.datasets.modechoice.load_pandas()
# Get a pandas dataframe of the mode choice data
mode_df = mode_data["data"]
# Look at a few rows of the data
mode_df.head()
Out[2]:
Even though we want to estimate a mixed logit model, we first start by speficying a basic MNL model. This will give us "good" starting values for the mixed logit model.
The utility equations used by Greene are as follows. Note that $i$ refers to a particular individual and $j$ refers to a particular travel mode. Moreover, $d_{i, \textrm{condition}}$ is an indicator that equals 1 for individual $i$ when the condition is true.
$\begin{align}
U_{ij} &= \alpha _{\textrm{air}} d_{i,\ j=\textrm{air}} +
\alpha _{\textrm{train}} d_{i,\ j=\textrm{train}} +
\alpha _{\textrm{bus}} d_{i,\ j=\textrm{bus}} + \\
&\quad \ \beta _{\textrm{generalized_cost}} \textrm{GC}_{ij} +
\beta _{\textrm{terminal_time}} \textrm{TTME}_{ij} +
\beta _{\textrm{household_income}} d_{i,\ j=\textrm{air}} \textrm{HINC}_{i} +
\epsilon _{ij}
\end{align}
$
In other words, there is an alternative-specific-constant (i.e. an intercept) for the air, train, and bus modes, and the car is used as the reference mode. Moreover, the alternative-specific variables (i.e. the generalized cost and terminal time variables) are specified has having generic effects on all modes. That is, the impact of a one-minute increase in terminal time or a 1\$ increase in "generalized cost" on the utility of a given mode is a-priori assumed to be the same for all modes. In contrast to the alternative-specific variables, household income is an individual characteristic and it is a-priori assumed to only affect the utility of traveling by air.
In [3]:
# Initialize the Ordered Dictionaries to store the model
# specification and parameter names
specification_dict = OrderedDict()
name_dict = OrderedDict()
# Add the variables to the specification and name dictionaries.
specification_dict["intercept"] = [1, 2, 3]
name_dict["intercept"] = ["ASC Air", "ASC Train", "ASC Bus"]
specification_dict["gc"] = [[1, 2, 3, 4]]
name_dict["gc"] = ["Generalized Cost (all modes)"]
specification_dict["ttme"] = [[1, 2, 3, 4]]
name_dict["ttme"] = ["Terminal Time (all modes)"]
specification_dict["hinc"] = [1]
name_dict["hinc"] = ["Household Income (air mode)"]
In [4]:
# Estimate the MNL model
# Initialize the model object
mnl_model = pl.create_choice_model(data=mode_df,
alt_id_col="mode",
obs_id_col="individual",
choice_col="choice",
specification=specification_dict,
model_type="MNL",
names=name_dict)
# Create the initial values for the parameters
# being estimated
num_params = 6
initial_values = np.zeros(num_params)
# Estimate the mnl_model
mnl_model.fit_mle(initial_values)
# View the estimation results
mnl_model.get_statsmodels_summary()
Out[4]:
One way to allow the variances of the utilities for the various alternatives to be different is to add a mean zero normal error component to each utility. One's utility equations would then be as follows:
$\begin{align}
U_{ij} &= \alpha _{\textrm{air}} d_{i,\ j=\textrm{air}} +
\alpha _{\textrm{train}} d_{i,\ j=\textrm{train}} +
\alpha _{\textrm{bus}} d_{i,\ j=\textrm{bus}} + \\
&\quad \ \beta _{\textrm{generalized_cost}} \textrm{GC}_{ij} +
\beta _{\textrm{terminal_time}} \textrm{TTME}_{ij} +
\beta _{\textrm{household_income}} d_{i,\ j=\textrm{air}} \textrm{HINC}_{i} +\\
&\quad \ \sigma_{\textrm{air}} d_{i,\ j=\textrm{air}} \eta_{ij} + \sigma_{\textrm{train}} d_{i,\ j=\textrm{train}} \eta_{ij} + \sigma_{\textrm{bus}} d_{i,\ j=\textrm{bus}} \eta_{ij} + \sigma_{\textrm{car}} d_{i,\ j=\textrm{car}} \eta_{ij} + \epsilon_{ij}\\
\textrm{where }& \eta_{ij} \sim N\left(0, 1\right)
\end{align}
$
There are two things to note about the utility equations above. First, as noted by Walker et al. (2007) not all of the $\sigma_j$'s are identifiable. As demonstrated below, this will be resolved by constraining the smallest $\sigma_j$ to zero. Secondly, note that the utility equations above are formally equivalent to treating this as a random-coefficients model, where we view the alternative specific constants as being randomly distributed. This can be seen by re-writing the utility equations as follows:
$\begin{align}
U_{ij} &= \beta _{\textrm{generalized_cost}} \textrm{GC}_{ij} +
\beta _{\textrm{terminal_time}} \textrm{TTME}_{ij} +
\beta _{\textrm{household_income}} d_{i,\ j=\textrm{air}} \textrm{HINC}_{i} +\\
&\quad \ \sigma_{\textrm{air}} d_{i,\ j=\textrm{air}} \tilde{\eta}_{ij} + \sigma_{\textrm{train}} d_{i,\ j=\textrm{train}} \tilde{\eta}_{ij} + \sigma_{\textrm{bus}} d_{i,\ j=\textrm{bus}} \tilde{\eta}_{ij} + \sigma_{\textrm{car}} d_{i,\ j=\textrm{car}} \tilde{\eta}_{ij} + \epsilon_{ij}\\
\textrm{where }& \tilde{\eta}_{ij} \sim N\left(\alpha _{j}, 1\right)\\
&\quad \ j \in \left\lbrace \textrm{air, train, bus, car}\right\rbrace\\
&\quad \ \alpha _{\textrm{car}} = 0
\end{align}
$
From this perspective, using a mixed-logit model to account for heteroskedasticity is equivalent to using a mixed-logit model where the alternative-specific constants are viewed as being randomly distributed throughout the population.
Programmatically, PyLogit implements mixed logit models as random-coefficients models, so we will use this second viewpoint to specify the model to be estimated.
In particular, we will instruct PyLogit to treat the alternative specific constants as normally distributed random variables, where we will estimate the mean and standard deviations of the distributions. Note that in order to estimate the $\sigma_{\textrm{car}}$ of the car alternative, we need to have a variable for the alternative specific constant of the car. We will therefore add the car to the intercept specification, but we will then constrain it to zero during estimation (and as noted in the equation above).
As in Walker et al. (2007), we will first attempt to estimate all the standard deviations. Then (for identification purposes) we will re-estimate the model, constraining the smallest standard deviation from the first estimation.
Sources:
Joan L. Walker, Moshe Ben-Akiva, and Denis Bolduc. "Identification of parameters in normal error component logit-mixture (NECLM) models." Journal of Applied Econometrics, 22(6):1095–1125, September 2007. ISSN 1099-1255. doi: 10.1002/jae.971. URL http://onlinelibrary.wiley. com/doi/10.1002/jae.971/abstract.
In [5]:
# Add the intercept for car to the specification dictionary
heteroskedastic_specification = OrderedDict()
heteroskedastic_names = OrderedDict()
heteroskedastic_specification["intercept"] = [1, 2, 3, 4]
heteroskedastic_names["intercept"] = ["ASC Air", "ASC Train", "ASC Bus", "ASC Car"]
for key in specification_dict:
if key != "intercept":
heteroskedastic_specification[key] = specification_dict[key]
heteroskedastic_names[key] = name_dict[key]
# Create a list of the variables whose standard deviations
# are to be estimated
mixing_variables = heteroskedastic_names["intercept"]
heteroskedastic_model_0 = pl.create_choice_model(data=mode_df,
alt_id_col="mode",
obs_id_col="individual",
choice_col="choice",
specification=heteroskedastic_specification,
model_type="Mixed Logit",
names=heteroskedastic_names,
mixing_id_col="individual",
mixing_vars=mixing_variables)
# Create the initial values for the parameters
# being estimated. Note that we start at the MNL estimates
# and that we need to have initial variables for the
# standard deviation parameters that we'll estimate as well
# as for intercept for car (which we'll constrain to zero)
initial_values = np.concatenate((mnl_model.params.values[:3],
np.zeros(1), # for the car intercept
mnl_model.params.values[3:],
np.zeros(len(mixing_variables))))
# Estimate the Mixed logit model.
# We set a seed for reproducibility, and we constrain the intercept
# for the car utility, i.e. position 4 (or in python, array index 3).
heteroskedastic_model_0.fit_mle(initial_values,
seed=26,
num_draws=500,
constrained_pos=[3])
# View the estimation results
heteroskedastic_model_0.get_statsmodels_summary()
Out[5]:
From the estimation results above, we can see that $\sigma_{\textrm{bus}}$ is the smallest standard deviation. We will therefore constrain this $\sigma_j$ to zero and re-estimate the model. Note also that the values of $\sigma_j$ are reported as positive or negative. This is because PyLogit performed an unconstrained maximization, and the normal distribution is symmetric, so it doesn't matter if we show multiply by negative one or not.
In [6]:
# Initialize a new model
heteroskedastic_model_1 = pl.create_choice_model(data=mode_df,
alt_id_col="mode",
obs_id_col="individual",
choice_col="choice",
specification=heteroskedastic_specification,
model_type="Mixed Logit",
names=heteroskedastic_names,
mixing_id_col="individual",
mixing_vars=mixing_variables)
# Create the initial values for the parameters
# being estimated. Note that we start at the MNL estimates
# and that we need to have initial variables for the
# standard deviation parameters that we'll estimate as well
# as for intercept for car (which we'll constrain to zero)
initial_values = np.concatenate((mnl_model.params.values[:3],
np.zeros(1), # for the car intercept
mnl_model.params.values[3:],
np.zeros(len(mixing_variables))))
# Estimate the Mixed logit model.
# We set a seed for reproducibility, and we constrain the parameter
# in position 4, i.e. the intercept for the car utility, and in
# position -2, i.e. the standard deviation for the bus.
heteroskedastic_model_1.fit_mle(initial_values,
seed=26,
num_draws=500,
constrained_pos=[3, -2])
# View the estimation results
heteroskedastic_model_1.get_statsmodels_summary()
Out[6]:
Given that the estimated $\sigma_{\textrm{train}}$ and $\sigma_{\textrm{car}}$ parameters are so close to zero and have such high p-values, we eliminate those variables only retain $\sigma_{\textrm{air}}$.
In [7]:
# Initialize a new model. Note we use the original specification and
# original name dictionary because we are not estimating a Sigma ASC Car
# and those dictionaries already contain all the needed information to
# estimate Sigma ASC Air
heteroskedastic_model_2 = pl.create_choice_model(data=mode_df,
alt_id_col="mode",
obs_id_col="individual",
choice_col="choice",
specification=specification_dict,
model_type="Mixed Logit",
names=name_dict,
mixing_id_col="individual",
mixing_vars=["ASC Air"])
# Create the initial values for the parameters
# being estimated. Note that we start at the estimates
# from the last model, using the estimated ASC's (minus the car ASC),
# the level-of-service coefficients, and Sigma Air.
initial_values = np.concatenate((heteroskedastic_model_1.params.values[:3],
heteroskedastic_model_1.params.values[4:8]))
# Estimate the Mixed logit model. The seed is for reproducibility
heteroskedastic_model_2.fit_mle(initial_values,
seed=26,
num_draws=500)
# View the estimation results
heteroskedastic_model_2.get_statsmodels_summary()
Out[7]:
In certain cases, one might wish to allow for correlated utilities. In particular, we use the $\epsilon_{ij}$ terms to represent the unobserved factors that affect the utility gained by an individual $i$ if they choose alternative $j$. It may be quite reasonable to expect the unobserved factors that contribute to the utility of alternative $j$ (say the Train alternative) to be correlated with the unobserved factors that contribute to the utility of alternative $k$ (say the Bus alternative). For instance, we do not observe factors such as the access and egress time to the terminals for the train and bus modes. Given that these are ground transportation modes that require smaller terminals than airplanes, it is likely that the train and bus terminals are close to the center of downtown Sydney and downtown Melbourne, and thus likely to be closer to one's destination as opposed to the outskirts of the cities. Since these unobserved factors are likely to be correlated, the utilities of the two modes are likely to be correlated.
One way to account for such correlation is through the addition of shared error components in one's utility equations. In our example specifically, we can add a error component to the train and bus utilities as follows:
$\begin{align}
U_{ij} &= \beta _{\textrm{generalized_cost}} \textrm{GC}_{ij} +
\beta _{\textrm{terminal_time}} \textrm{TTME}_{ij} +
\beta _{\textrm{household_income}} d_{i,\ j=\textrm{air}} \textrm{HINC}_{i} +\\
&\quad \ \sigma_{\textrm{air}} d_{i,\ j=\textrm{air}} \tilde{\eta}_{ij} + \sigma_{\textrm{nest}} d_{i,\ j \in \textrm{(train, bus)}} \xi + \epsilon_{ij}\\
\textrm{where }& \tilde{\eta}_{ij} \sim N\left(\alpha _{j}, 1\right)\\
&j \in \left\lbrace \textrm{air, train, bus, car}\right\rbrace\\
&\alpha _{\textrm{car}} = 0 \\
& \xi \sim N\left( 0, 1 \right)
\end{align}
$
Note that we have removed the $\sigma_j$ parameters for all $j \neq \textrm{air}$ because of the estimation results from above.
As before, we will need to implement this specification through a random-coefficients perspective, so we will add a dummy variable to the model that is $1$ if the alternative is train or bus, and $0$ otherwise. Then, when treating the coefficient on this dummy variable as a random coefficient, we will constrain the mean to be zero, and estimate the standard deviation. This is analagous to the way that we estimated the standard deviation for the car alternative, even though the "ASC Car" coefficient was constrained to be zero.
In [8]:
# Add a "nesting variable" to the dataset by first
# initializing the variable for all modes, then
# setting the veriable equal to 1, only for the
# train (2) and bus(3) modes
mode_df["train_bus_nesting_dummy"] = 0
mode_df.loc[mode_df["mode"].isin([2, 3]),
"train_bus_nesting_dummy"] = 1
# Initialize the specification and name dictionaries
# for the model to be estimated.
nesting_specification = OrderedDict()
nesting_names = OrderedDict()
# As with 'heteroskedastic model 2', we build off the
# original specification and name dictionaries because
# we are not estimating a value for Sigma ASC Car.
for key in specification_dict:
nesting_specification[key] = specification_dict[key]
nesting_names[key] = name_dict[key]
# Note that we are adding a single subset-specific coefficient to the
# nesting dummy. This coefficient (which will be constrained to zero), will
# be the mean of a randomly distributed nesting variable.
nesting_specification["train_bus_nesting_dummy"] = [[2, 3]]
nesting_names["train_bus_nesting_dummy"] = ["Nesting Parameter"]
In [9]:
# Create a list of the variables whose standard deviations
# are to be estimated
mixing_vars_nesting = ["ASC Air", "Nesting Parameter"]
nesting_model = pl.create_choice_model(data=mode_df,
alt_id_col="mode",
obs_id_col="individual",
choice_col="choice",
specification=nesting_specification,
model_type="Mixed Logit",
names=nesting_names,
mixing_id_col="individual",
mixing_vars=mixing_vars_nesting)
# Create the initial values for the parameters
# being estimated.
initial_values = np.concatenate((heteroskedastic_model_2.params.values[:6],
np.zeros(1), # for the nesting dummy
heteroskedastic_model_2.params.values[-1:],
np.zeros(1))) # for the Sigma Nesting Dummy
# Estimate the Mixed logit model.
# We set a seed for reproducibility, and we constrain the parameter
# in position 7, i.e. the nesting dummy.
nesting_model.fit_mle(initial_values,
seed=28,
num_draws=500,
constrained_pos=[6])
# View the estimation results
nesting_model.get_statsmodels_summary()
Out[9]:
As the estimation results show, $\sigma_{\textrm{Nesting Parameter}}$ is not significantly different from zero at any common alpha-level. This indicates that the utilities of the train and bus coefficients may not actually be correlated, as we might have guessed a priori. This result could have been anticipated by considering the fact that this model is a restricted version of the "Heteroskedastic Model 1" that was estimated earlier. Here we are restricting $\sigma_{\textrm{train}}$ and $\sigma_{\textrm{bus}}$ terms to be the same value, and we are calling this value $\sigma_{\textrm{Nesting Parameter}}$. Given that neither parameter in the unrestricted model had a statistically significant difference from zero, we could anticipate that the restricted model would also not result in a parameter estimate that had a statistically significant difference from zero.